Method and device for processing multi-view video data

ABSTRACT

A method for processing multi-view video data including, for at least one block of an image of a view encoded in an encoded data stream representing the multi-view video, obtaining at least one information item, which specifies a mode for obtaining a synthesis data item, from among first and second obtaining modes. The synthesis data item is used to synthesize at least one image of an intermediate view of the multi-view video, the intermediate view not being encoded in the encoded data stream. The first obtaining mode involves decoding an information item representing the synthesis data item from the encoded data stream, the second obtaining mode involves obtaining the synthesis data item from at least the reconstructed encoded image. At least one part of an image of the intermediate view is synthesised from at least the reconstructed encoded image and the synthesis data item obtained by the specified mode.

FIELD OF THE INVENTION

The invention relates to immersive videos, representative of a scenecaptured by one or more cameras, including the videos for virtualreality and free navigation. More particularly, the invention relates tothe processing (encoding, decoding, synthesis of intermediate views) ofdata from such videos.

PRIOR ART

An immersive video allows a viewer to watch a scene from any viewpoint,even from a viewpoint that has not been captured by a camera. A typicalacquisition system is a set of cameras that captures a scene withseveral cameras located outside the scene or with divergent camerasbuilt on a spherical platform. The videos are usually displayed viavirtual reality headsets (also known as head-mounted devices, or HMDs),but can also be displayed on 2D screens with an additional system tointeract with the user.

Free navigation in a scene requires that every movement of the user isproperly managed in order to avoid motion sickness. The movement isusually correctly captured by the display device (an HMD headset, forexample). However, providing the correct pixels for display, regardlessof the movement of the user (rotational or translational), is currentlyproblematic. This requires multiple captured views and the ability togenerate additional virtual (synthesised) views, calculated from thedecoded captured views and the associated depths. The number of views tobe transmitted varies depending on the use cases. However, the number ofviews to be transmitted and the amount of associated data are oftenhigh. Consequently, the transmission of the views is an essential aspectof immersive video applications. It is therefore necessary to reduce thebit rate of the information to be transmitted as much as possiblewithout compromising the quality of the synthesis of the intermediateviews.

In a typical immersive video processing scheme, the views are physicallycaptured or generated by computer. In some cases, the depths are alsocaptured with dedicated sensors. However, the quality of this depthinformation is generally poor and prevents an effective synthesis of theintermediate viewpoints.

Depth maps can also be calculated from the texture images of thecaptured videos. Many depth estimation algorithms exist and are used inthe state of the art.

The texture images and the estimated depth information are encoded andsent to a user's display device, as illustrated in FIG. 1 . FIG. 1 showsan immersive video processing scheme comprising for example two capturedviews having respectively the texture information

T_(x0u0) and T_(x1y0). Depth information D_(x0y0) and D_(x1y0)associated with each view T_(x0y0) and T_(x1y0) is estimated by anestimation module FE. For example, the depth information D_(x0y0) andD_(x1y0) is obtained by a depth estimation software (Depth EstimationReference Software, or DERS), the views T_(x0y0) and T_(x1y0) and thedepth information D_(x0y0) and D_(x1y0) obtained are then encoded(CODEC), for example using an MV-HEVC encoder. On the client side, theviews T*_(x0y0) and T*_(x1y0) and the associated depths of each viewD*x_(0y0) and D*_(x1y0) are decoded and used by a synthesis algorithm(SYNTHESIS) to calculate intermediate views, for example hereintermediate views S_(x0y0) and S_(x1y0). For example, the VSRS (ViewSynthesis Reference Software) software can be used as a view synthesisalgorithm.

When the depth maps are calculated prior to encoding and transmittingthe encoded data of an immersive video, various problems areencountered. In particular, the rate associated with the transmission ofthe various views is high. Particularly, although depth maps aregenerally less expensive than texture, they remain a significantproportion of the bit stream (15% to 30% of the total).

In addition, complete depth maps are generated and sent, whereas on theclient side, not all parts of all depth maps are necessarily useful.Indeed, the views can have redundant information, which makes some partsof depth maps unnecessary. In addition, in some cases, the viewers mayrequest only specific viewpoints. Without a feedback channel between theclient and the server providing the encoded immersive video, the depthestimator located on the server side is not aware of these specificviewpoints.

Calculating the depth information on the server side avoids anyinteraction between the depth estimator and the synthesis algorithm. Forexample, if a depth estimator wants to inform the synthesis algorithmthat it cannot correctly find the depth of a specific area, it musttransmit this information in the binary stream, most likely in the formof a binary map.

In addition, the configuration of the encoder to encode the depth mapsin order to obtain the best compromise between synthesis quality andencoding cost for depth map transmission is not obvious.

Finally, the number of pixels to be processed by a decoder is high whenthe textures and the depth maps are encoded, transmitted and decoded.This can slow down the deployment of immersive video processing schemeson terminals such as smartphones.

There is therefore a need to improve the prior art.

SUMMARY OF THE INVENTION

The invention improves the state of the art. For this purpose, itrelates to a method for processing the data of a multi-view video,comprising:

-   -   obtaining, for at least one block of an image of a view encoded        in an encoded data stream representative of the multi-view        video, at least one information item specifying a mode for        obtaining at least one synthesis data item, from among a first        obtaining mode and a second obtaining mode,    -   said at least one synthesis data item being used to synthesise        at least one image of an intermediate view of the multi-view        video, said intermediate view not being encoded in said encoded        data stream,    -   said first obtaining mode corresponding to decoding at least one        information item representative of the at least one synthesis        data item from the encoded data stream, said second obtaining        mode corresponding to obtaining the at least one synthesis data        item from at least said reconstructed encoded image,    -   obtaining the at least one synthesis data item according to the        obtaining mode specified by said at least one information item,    -   synthesising at least one part of an image of said intermediate        view from at least said reconstructed encoded image and said at        least one synthesis data item obtained.

The invention takes advantage of various modes for obtaining synthesisdata in a flexible way by allowing the selection of an optimal mode forobtaining each synthesis data item, for example in terms of encodingcost/quality of the synthesis data item or depending on the toolsavailable on the decoder side and/or encoder side. This selection isflexible since it can be done at the block, image, view or video level.The granularity level of the mode for obtaining the synthesis data cantherefore be adapted depending on the content of the multi-view videofor example or the tools available on the client/decoder side.

According to a first obtaining mode, the synthesis data item isdetermined on the encoder side, encoded and transmitted to a decoder ina data stream. According to this first obtaining mode, the quality ofthe synthesis data item can be privileged since it is determined fromoriginal images, that are not encoded for example. The synthesis dataitem is not subject to the encoding artifacts of the decoded texturesduring its estimation.

According to a second obtaining mode, the synthesis data item isdetermined on the decoder side. According to this second obtaining mode,the data necessary to synthesise intermediate views is obtained from thedecoded and reconstructed views that have been transmitted to thedecoder. Such synthesis data can be obtained at the decoder, or by adecoder-independent module taking as input the views decoded andreconstructed by the decoder. This second obtaining mode reduces theencoding cost the multi-view video data and makes the decoding of themulti-view video easier, since the decoder no longer has to decode thedata used for the intermediate view synthesis.

The invention also improves the quality of the intermediate viewsynthesis. Indeed, in some cases, a synthesis data item estimated at thedecoder can be more appropriate for the synthesis of views than anencoded synthesis data item, for example when different estimators areavailable on the client side and server side. In other cases,determining the synthesis data item at the encoder may be moreappropriate, for example when the decoded textures have compressionartifacts or when the textures do not include enough redundantinformation to estimate the synthesis data on the client side.

According to a particular embodiment of the invention, said at least onesynthesis data item corresponds to at least one part of a depth map.

According to another particular embodiment of the invention, said atleast one information item specifying a mode for obtaining the synthesisdata item is obtained by decoding a syntax element. According to thisparticular embodiment of the invention, the information item is encodedin the data stream.

According to another particular embodiment of the invention, said atleast one information item specifying a mode for obtaining the synthesisdata item is obtained from at least one encoded data item for thereconstructed encoded image. According to this particular embodiment ofthe invention, the information item is not directly encoded in the datastream, it is derived from the encoded data for an image in the datastream. The process for deriving the synthesis data item is identicalhere at the encoder and the decoder.

According to another particular embodiment of the invention, theobtaining mode is selected from among the first obtaining mode and thesecond obtaining mode depending on a value of a quantization parameterused to encode at least said block.

According to another particular embodiment of the invention, the methodfurther comprises, when said at least one information item specifiesthat the synthesis data item is obtained according to the secondobtaining mode:

-   -   decoding at least one control parameter from an encoded data        stream,    -   applying said control parameter when obtaining said synthesis        data item according to the second obtaining mode.

This particular embodiment of the invention makes it possible to controlthe method for obtaining the synthesis data item, for example to controlthe features of a depth estimator such as the size of the search windowor the precision. The control parameter can also specify which depthestimator to use, and/or the parameters of that estimator, or a depthmap to initialise the estimator.

The invention also relates to a device for processing multi-view videodata comprising a processor configured to:

-   -   obtain, for at least one block of an image of a view encoded in        an encoded data stream representative of the multi-view video,        at least one information item specifying a mode for obtaining at        least one synthesis data item, from among a first obtaining mode        and a second obtaining mode,    -   said at least one synthesis data item being used to synthesise        at least one image of an intermediate view of the multi-view        video, said intermediate view not being encoded in said encoded        data stream,    -   said first obtaining mode corresponding to decoding at least one        information item representative of the at least one synthesis        data item from the encoded data stream, said second obtaining        mode corresponding to obtaining the at least one synthesis data        item from at least said reconstructed encoded image,    -   obtain the at least one synthesis data item according to the        obtaining mode specified by said at least one information item,    -   synthesise at least one part of an image of said intermediate        view from at least said reconstructed encoded image and said at        least one synthesis data item obtained.

According to a particular embodiment of the invention, the device forprocessing multi-view video data is comprised in a terminal.

The invention also relates to a method for encoding multi-view videodata, comprising:

-   -   determining, for at least one block of an image of a view in an        encoded data stream representative of the multi-view video, at        least one information item specifying a mode for obtaining at        least one synthesis data item, from among a first obtaining mode        and a second obtaining mode,    -   said at least one synthesis data item being used to synthesise        at least one image of an intermediate view of the multi-view        video, said intermediate view not being encoded in said encoded        data stream,    -   said first obtaining mode corresponding to decoding at least one        information item representative of the at least one synthesis        data item from the encoded data stream, said second obtaining        mode corresponding to obtaining the at least one synthesis data        item from at least said reconstructed encoded image,    -   encoding said image in the encoded data stream.

According to a particular embodiment of the invention, the encodingmethod comprises encoding in the data stream a syntax element associatedwith said information item specifying a mode for obtaining the synthesisdata item.

According to a particular embodiment of the invention, the encodingmethod further comprises, when the information item specifies that thesynthesis data item is obtained according to the second obtaining mode:

-   -   encoding in an encoded data stream at least one control        parameter to be applied when obtaining said synthesis data item        according to the second obtaining mode.

The invention also relates to a multi-view video data encoding device,comprising a processor and a memory configured to:

-   -   determine, for at least one block of an image of a view in an        encoded data stream representative of the multi-view video, at        least one information item specifying a mode for obtaining at        least one synthesis data item, from among a first obtaining mode        and a second obtaining mode,    -   said at least one synthesis data item being used to synthesise        at least one image of an intermediate view of the multi-view        video, said intermediate view not being encoded in said encoded        data stream,    -   said first obtaining mode corresponding to decoding at least one        information item representative of the at least one synthesis        data item from the encoded data stream, said second obtaining        mode corresponding to obtaining the at least one synthesis data        item from at least said reconstructed encoded image,    -   encode said image in the encoded data stream.

The method for processing multi-view video data according to theinvention can be implemented in various ways, notably in wired form orin software form. According to a particular embodiment of the invention,the method for processing multi-view video data is implemented by acomputer program. The invention also relates to a computer programcomprising instructions for implementing the method for processingmulti-view video data according to any one of the particular embodimentspreviously described, when said program is executed by a processor. Sucha program can use any programming language. It can be downloaded from acommunication network and/or recorded on a computer-readable medium.This program can use any programming language, and can be in the form ofsource code, object code, or intermediate code between source code andobject code, such as in a partially compiled form, or in any otherdesirable form.

The invention also relates to a computer-readable storage medium or datamedium comprising instructions of a computer program as mentioned above.The recording medium mentioned above can be any entity or device able tostore the program. For example, the medium can comprise a storage means,such as a ROM, for example a CD-ROM or a microelectronic circuit ROM, aUSB flash drive, or a magnetic recording means, for example a harddrive. On the other hand, the recording medium can correspond to atransmissible medium such as an electrical or optical signal, that canbe carried via an electrical or optical cable, by radio or by othermeans. The program according to the invention can be downloaded inparticular on an Internet-type network.

Alternatively, the recording medium can correspond to an integratedcircuit in which the program is embedded, the circuit being adapted toexecute or to be used in the execution of the method in question.

LIST OF FIGURES

Other characteristics and advantages of the invention will emerge moreclearly upon reading the following description of a particularembodiment, provided as a simple illustrative non-restrictive example,and the annexed drawings, wherein:

FIG. 1 illustrates a multi-view video data processing scheme accordingto the prior art.

FIG. 2 illustrates a multi-view video data processing scheme accordingto a particular embodiment of the invention.

FIG. 3A illustrates steps of a method for processing multi-view videodata according to a particular embodiment of the invention.

FIG. 3B illustrates steps of a method for processing multi-view videodata according to another particular embodiment of the invention.

FIG. 4A illustrates steps of a multi-view video encoding methodaccording to a particular embodiment of the invention.

FIG. 4B illustrates steps of a multi-view video encoding methodaccording to particular embodiments of the invention.

FIG. 5 illustrates an example of a multi-view video data processingscheme according to a particular embodiment of the invention.

FIG. 6A illustrates a texture matrix of a multi-view video according toa particular embodiment of the invention.

FIG. 6B illustrates steps of the depth encoding method for a currentblock according to a particular embodiment of the invention.

FIG. 7A illustrates an example of a data stream according to aparticular embodiment of the invention.

FIG. 7B illustrates an example of a data stream according to anotherparticular embodiment of the invention.

FIG. 8 illustrates a multi-view video encoding device according to aparticular embodiment of the invention.

FIG. 9 illustrates a device for processing multi-view video dataaccording to a particular embodiment of the invention.

DESCRIPTION OF AN EMBODIMENT OF THE INVENTION

FIG. 1 , described above, illustrates a multi-view video data processingscheme according to the prior art. According to this embodiment, thedepth information is determined, encoded and transmitted in a datastream to the decoder that decodes it.

FIG. 2 illustrates a multi-view video data processing scheme accordingto a particular embodiment of the invention. According to thisparticular embodiment of the invention, the depth information is notencoded in the data stream, but determined on the client side, from thereconstructed images of the multi-view video.

According to the scheme illustrated in FIG. 2 , the texture imagesT_(x0y0) and T_(x1y0) originating from captured views are encoded(CODEC), for example using an MV-HEVC encoder, and sent to a user'sdisplay device, for example. On the client side, the textures T*_(x0y0)and T*_(x1y0) of the views are decoded and used to estimate the depthinformation D′_(x0y0) and D′_(x1y0) associated with each view T_(x0y0)and T_(x1y0), by an estimation module FE. For example, the depthinformation D′_(x0y0) and D′_(x1y0) is obtained by a depth estimationsoftware (DERS).

The decoded views T*_(x0y0) and T*_(x1y0) and the associated depths ofeach view D′_(x0y0) and D′_(x1y0) are used by a synthesis algorithm(SYNTHESIS) to calculate intermediate views, for example hereintermediate views S′_(x0y0) and S′_(x1y0). For example, theabove-mentioned VSRS software can be used as a view synthesis algorithm.

When the depth information is estimated after transmitting the encodeddata of the multi-view video, the following problems may be encountered.Due to compression artifacts, for example block effects, or quantizationnoise, present in the textures decoded and used to estimate the depthinformation, especially at a low rate, incorrect depth values can beobtained.

In addition, the complexity of the client terminal is greater than whenthe depth information is transmitted to the decoder. This may implyusing simpler depth estimation algorithms at the encoder, that may thenfail in complex scenes.

On the client side, it may happen that the texture information does notinclude enough redundancy to perform the depth estimation or data usefulfor synthesis, for example due to server-side encoding of the textureinformation during which texture information may not be encoded.

The invention proposes a method for selecting a mode for obtaining thesynthesis data from among a first obtaining mode (M1) according to whichthe synthesis data is encoded and transmitted to the decoder and asecond obtaining mode (M2) according to which the synthesis data isestimated on the client side. This method takes advantage of bothapproaches in a flexible way.

For this purpose, the best path to obtain one or more synthesis dataitems is selected for each image, or each block or for any othergranularity.

FIG. 3A illustrates steps of a method for processing multi-view videodata according to a particular embodiment of the invention. According tothis particular embodiment of the invention, the selected obtaining modeis encoded and transmitted to the decoder.

A data stream BS comprising in particular texture information of one ormore views of a multi-view video is transmitted to the decoder. Forexample, it is considered that two views have been encoded in the datastream BS.

The data stream BS also comprises at least one syntax elementrepresentative of an information item specifying a mode for obtaining atleast one synthesis data item, from among a first obtaining mode M1 anda second obtaining mode M2.

In a step 30, the decoder decodes the texture information of the datastream to obtain the textures T*₀ and T*₁.

In a step 31, the syntax element representative of the information itemspecifying an obtaining mode is decoded from the data stream. Thissyntax element is encoded in the data stream for at least one block ofthe texture image of a view. Its value can therefore change at eachtexture block of a view. According to another variant, the syntaxelement is encoded once for all blocks of the texture image of a view T₀or T₁. The information item specifying a mode for obtaining a synthesisdata item is thus the same for all blocks of the texture image T₀ or T₁.

In yet another variant, the syntax element is encoded once for alltexture images of the same view or the syntax element is encoded oncefor all views.

The variant according to which the syntax element is encoded for eachtexture image of a view is considered here. Following step 31, anobtaining mode information item d₀ associated with the decoded textureimage T*₀ and an obtaining mode information item dl associated with thedecoded texture image T*₁ are then obtained.

In a step 32, it is checked for each information item d₀ and d₁specifying a mode for obtaining the synthesis data associatedrespectively with the decoded texture images T*₀ and T*₁ whether theobtaining mode corresponds to the first obtaining mode M1 or to thesecond obtaining mode M2.

If the information item d₀, respectively d₁, specified the firstobtaining mode M1, in a step 34, the synthesis data D*₀, respectivelyD*₁, associated with the decoded texture image T*₀, respectively T*₁, isdecoded from the data stream BS.

If the information item d₀, respectively d₁, specifies the secondobtaining mode M2, in a step 33, the synthesis data D*₀, respectivelyD*₁, associated with the decoded texture image T*₀, respectively T*₁, isestimated from the reconstructed texture images of the multi-view video.For this purpose, the estimation can use the decoded texture T*₀,respectively T*₁, and possibly other previously reconstructed textureimages.

In a step 35, the decoded textures T*₀ and T*₁ and the decoded (D*₀,D*₁) or estimated (D*₀, D*₁) synthesis information are used tosynthesise an image of an intermediate view S0.5.

FIG. 3B illustrates steps of a method for processing multi-view videodata according to another particular embodiment of the invention.According to this other particular embodiment of the invention, theselected obtaining mode is not transmitted to the decoder. It derivesthe obtaining mode from the previously decoded texture data.

A data stream BS comprising in particular texture information of one ormore views of a multi-view video is transmitted to the decoder. Forexample, it is considered that two views have been encoded in the datastream BS.

In a step 30′, the decoder decodes the texture information of the datastream to obtain the textures T*₀ and T*₁.

In a step 32′, the decoder obtains an information item specifying a modefrom among a first obtaining mode M1 and a second obtaining mode M2, forobtaining at least one synthesis data item to be used to synthesise animage of an intermediate view. According to a variant, this informationitem can be obtained for each block of the texture image of a view. Theobtaining mode can thus change at each texture block of a view.

According to another embodiment, this information item is obtained oncefor all blocks of the texture image of a view T*₀ or T*₁. Theinformation item specifying a mode for obtaining a synthesis data itemis thus the same for all blocks of the texture image T*₀ or T*₁.

According to yet another variant, the information item is obtained oncefor all texture images of a same view or the information item isobtained once for all views.

The variant according to which the information item is obtained for eachtexture image of a view is considered here. Following step 32′, anobtaining mode information item d₀ associated with the decoded textureimage T*₀ and an obtaining mode information item d₁ associated with thedecoded texture image T*₁ are then obtained. The obtaining modeinformation item is obtained here by applying the same determinationprocess as that applied at the encoder. An example of a determinationprocess is described later in relation to FIG. 4 .

Following step 32′, if the information item d₀, respectively d₁,specified the first obtaining mode M1, in a step 34′, the synthesis dataD*₀, respectively D*₁, associated with the decoded texture image T*₀,respectively T*₁, is decoded from the data stream BS.

If the information item d₀, respectively d₁, specifies the secondobtaining mode M2, in a step 33′, the synthesis data D⁺ ₀, respectivelyD⁺ ₁, associated with the decoded texture image T*₀, respectively T*₁,is estimated from the reconstructed texture images of the multi-viewvideo. For this purpose, the estimation can use the decoded texture T*₀,respectively T*₁, and possibly other previously reconstructed textureimages.

In a step 35′, the decoded textures T*₀ and T*₁ and the decoded (D*₀,D*₁) or estimated (D⁺ ₀, D⁺ ₁) synthesis information are used tosynthesise an image of an intermediate view S0.5.

The method for processing multi-view video data described here accordingto particular embodiments of the invention can notably be applied in thecase where the synthesis data corresponds to depth information. However,the data processing method is applicable to any type of synthesis data,such as an object segmentation map.

It is possible for a given view at a given time of the video to applythe method described above to several types of synthesis data. Forexample, if the synthesis module is assisted by a depth map and anobject segmentation map, these two types of synthesis data can bepartially transmitted to the decoder, the other part being derived bythe decoder or the synthesis module.

It should also be noted that one part of the texture can be estimated,for example by interpolation. The view corresponding to such a textureestimated at the decoder is considered in this case as a synthesis dataitem.

The examples described here include two texture views, respectivelyproducing two depth maps, but other combinations are of course possible,including processing a depth map at a given time, associated with one ormore texture views.

FIG. 4A illustrates steps of a multi-view video encoding methodaccording to a particular embodiment of the invention. The encodingmethod is described here in the case of two views comprisingrespectively the textures T₀ and T₁.

In a step 40, each texture T₀ and T₁ is encoded and decoded to providethe decoded textures T*₀ and T*₁. It should be noted that the texturescan correspond here to an image of a view, or a block of an image of aview or any other type of granularity related to texture information ofa multi-view video.

In a step 41, synthesis data, for example depth maps D⁺ ₀ and D⁺ ₁, areestimated from the decoded textures T*₀ and T*₁, using a depthestimator. This is the second method M2 for obtaining the synthesisdata.

In a step 42, the synthesis data D₀ and D₁ is estimated from the uncodedtextures T₀ and T₁, for example using a depth estimator. In a step 43,the obtained synthesis data D₀ and D₁ are then encoded and decoded toprovide reconstructed synthesis data D*₀ and D*₁. This is the first modeM1 for obtaining the synthesis data.

In a step 44, an obtaining mode to be used at the decoder to obtain thesynthesis data is determined from among the first obtaining mode M1 andthe second obtaining mode M2.

According to a particular embodiment of the invention, a syntax elementis encoded in the data stream to specify the selected obtaining mode.According to this particular embodiment of the invention, variousvariants are possible depending on how the rate and the distortion areevaluated according to the criterion to be minimised J=D+λR, where Rcorresponds to the rate, D corresponds to the distortion and λ is theLagrangian used for the optimisation.

A first variant is based on the synthesis of an intermediate view or ablock of an intermediate view, in the case where the obtaining mode isencoded for each block, and to evaluate the quality of the synthesisedview, considering the two modes for obtaining the synthesis data. Afirst version of the intermediate view is thus synthesised for theobtaining mode M2 from the decoded textures T*₀ and T*₁ and thesynthesis data D⁺ ₀ and D⁺ ₁ estimated from the decoded textures T*₀ andT*₁. The rate then corresponds to the encoding cost of the T*₀ and T*₁textures and to the encoding cost of the syntax element specifying theselected obtaining mode. This rate can be calculated precisely using,for example, an entropic encoder (for example, an arithmetic binaryencoding, a variable-length encoding, with or without contextadaptation). A second version of the intermediate view is alsosynthesised for the obtaining mode M1 from the decoded textures T*₀ andT*₁ and the decoded synthesis data D*₀ and D*₁. The rate thencorresponds to the encoding cost of the textures T*₀ and T*₁ and thesynthesis data D*₀ and D*₁ to which the encoding cost of the syntaxelement specifying the selected obtaining mode is added. This rate canbe calculated as specified above.

In both cases, the distortion can be calculated by a metric comparingthe image or block of the synthesised view with the uncoded image orblock of the synthesised view from the uncoded textures T₀ and T₁ andthe uncoded synthesis data D₀ and D₁.

The obtaining mode providing the lowest rate/distortion cost J isselected.

According to another variant, it is possible to determine the distortionby applying a metric without reference on the synthesised image or blockto avoid using the original uncompressed texture. Such a metric withoutreference can for example measure the amount of noise, blur, blockeffect, sharpness of edges, etc. in the synthesised image or block.

According to another variant, the obtaining mode is selected for exampleby comparing the synthesis data D₀ and D₁ estimated from theuncompressed textures and the synthesis data D⁺ ₀ and D⁺ ₁ estimatedfrom the encoded-decoded textures. If the synthesis data is closeenough, according to a defined criterion, estimating the synthesis dataon the client side will be more efficient than encoding and transmittingthe synthesis data. According to this variant, the synthesis of an imageor a block of an intermediate view is avoided.

Other variants are also possible to determine a mode for obtaining thesynthesis data, when it corresponds to depth maps. The selection of anobtaining mode can for example depend on the characteristics of thedepth information item. For example, a computer-generated depthinformation item or a high-quality captured depth is more likely to besuitable for the obtaining mode M1. According to this variant, the depthmaps can also be estimated from the decoded textures as described aboveand placed in competition with the computer-generated or high-qualitycaptured depth maps. The computer-generated or high-quality captureddepth maps then replace the depth maps estimated from the uncompressedtextures in the method described above.

According to another variant, the depth quality can be used to determinea mode for obtaining the synthesis data. The depth quality, that can bemeasured by an appropriate objective metric, may include relevantinformation. For example, when the depth quality is low, or when thetemporal coherence of the depth is low, it is likely that the obtainingmode M2 is the most suitable for obtaining the depth information.

Once the mode for obtaining the synthesis data is selected at the end ofstep 44, in a step 45, a syntax element d representative of the selectedobtaining mode is encoded in the data stream. When the selected andencoded mode corresponds to the first obtaining mode M1, the synthesisdata D₀ and D₁ is also encoded in the data flow, for considered block orimage.

According to a particular embodiment of the invention, when the selectedand encoded mode corresponds to the second obtaining mode M2, in a step46, additional information can also be encoded in the data stream. Forexample, such information can correspond to one or more controlparameters to be applied to the decoder or by a synthesis module whenobtaining said synthesis data item according to the second obtainingmode. These parameters can be used to control a synthesis data or depthestimator, for example.

For example, the control parameters can control the features of a depthestimator, such as increasing or decreasing the search interval, orincreasing or decreasing the precision.

The control parameters can specify how a synthesis data item should beestimated on the decoder side. For example, the control parametersspecify which depth estimator to use. For example, in step 41 forestimating the depth maps, the encoder can test several depth estimatorsand select the estimator providing the best rate/distortion compromisefrom among: a pixel-based depth estimator, a triangle-warping baseddepth estimator, a fast depth estimator, a monocular neural networkdepth estimator, a neural network depth estimator using multiplereferences. According to this variant, the encoder notifies the decoderor the synthesis module to use a similar synthesis data estimator.

According to another variant or in addition to the previous variant, thecontrol parameters can include parameters of a depth estimator such asthe disparity interval, the precision, the neural network model, theoptimisation or aggregation method, the smoothing factors of the energyfunctions, the cost functions (colour-based, correlation-based,frequency-based), a simple depth map that can be used as initialisationfor the client-side depth estimator, etc.

FIG. 4B illustrates steps of a multi-view video encoding methodaccording to another particular embodiment of the invention. Accordingto the particular embodiment described here, the mode for obtaining thesynthesis data is not encoded in the data stream, but deduced from theencoded information that will be available to the decoder.

The encoding method is described here in the case of two viewscomprising respectively the textures T₀ and T₁.

In a step 40′, each texture T₀ and T₁ is encoded and decoded to providethe decoded textures T*₀ and T*₁. It should be noted that the texturescan correspond here to an image of a view, or a block of an image of aview or any other type of granularity related to texture information ofa multi-view video.

In a step 44′, an obtaining mode to be used at the decoder to obtain thesynthesis data is determined from among the first obtaining mode M1 andthe second obtaining mode M2.

According to the particular embodiment described here, the encoder canuse any information item that will be available to the decoder, todecide on the obtaining mode to be applied to the considered block orimage.

According to a variant, the obtaining mode can be selected based on aquantization parameter, for example, a QP used to encode an image or atexture block. For example, when the quantization parameter is greaterthan a given threshold, the second obtaining mode is selected, otherwisethe first obtaining mode is selected.

According to another variant, when the synthesis data corresponds todepth information, the synthesis data D₀ and D₁ can be generated bycomputer or captured in high quality. This type of synthesis data ismore suitable for the obtaining mode Mi. Thus, when this is the case,the selected mode for obtaining the synthesis data will then be theobtaining mode Mi. According to this variant, a metadata item must betransmitted to the decoder to specify the origin of the depth(computer-generated, captured in high quality). This information itemcan be transmitted at the view sequence level.

At the end of step 44′, if the first obtaining mode M1 is selected, in astep 42′, the synthesis data D₀ and D₁ is estimated from the uncodedtextures T₀ and T₁, for example using a depth estimator. This estimationis of course not carried out in the case where the synthesis data isfrom computer-generation or high-quality capture.

In a step 47′, the synthesis data D₀ and D₁ obtained is then encoded inthe data stream.

When the selected obtaining mode corresponds to the second obtainingmode M2, according to a particular embodiment of the invention,additional information can also be encoded in the data stream, in a step46′. For example, such information can correspond to one or more controlparameters to be applied to the decoder or by a synthesis module whenobtaining said synthesis data item according to the second obtainingmode. Such control parameters are similar to those described in relationto FIG. 4A.

FIG. 5 illustrates an example of a multi-view video data processingscheme according to a particular embodiment of the invention.

According to a particular embodiment of the invention, a scene iscaptured by a video capture system CAPT. For example, the view capturesystem includes one or more cameras capturing the scene.

According to the example described here, the scene is captured by twoconverging cameras, located outside the scene and looking towards thescene from two distinct locations. The cameras are therefore atdifferent distances from the scene and have differentangles/orientations. Each camera provides a sequence of uncompressedimages. The sequences of images comprise a sequence of texture images T₀and T1 respectively.

The texture images T₀ and T₁ from the sequences of image provided by thetwo cameras respectively are encoded by an encoder COD, for example anMV-HEVC encoder that is a multi-view video encoder. The encoder CODprovides a data stream BS that is transmitted to a decoder DEC, forexample via a data network.

During encoding, the depth maps D₀ and D₁ are estimated from theuncompressed textures T₀ and T₁ and the depth maps D⁺ ₀ and D⁺ ₁ areestimated from the decoded textures T*₀ and T*₁ using a depth estimator,for example the DERS estimator. A first view T′₀ located at a positioncaptured by one of the cameras is synthesised, for example here theposition 0, using the depth map D₀, and a second view T″₀ located at thesame position is synthesised using the depth map D⁺ ₀. The quality ofthe two synthesised views is compared, for example by calculating thePSNR (Peak Signal to Noise Ratio) between each of the synthesised viewsT′₀, T″₀and the captured view T₀ located at the same position. Thecomparison allows the selection of an obtaining mode for the depth mapD₀ from among a first acquisition mode according to which the depth mapD₀ is encoded and transmitted to the decoder and a second acquisitionmode according to which the depth map D⁺ ₀ is estimated at the decoder.The same method is iterated for the depth map D₁ associated with thecaptured texture T₁.

FIG. 7A illustrates an example of one part of a data stream BS accordingto this particular embodiment of the invention. The data stream BScomprises the encoded textures T₀ and T₁ and the syntax elements do anddl specifying respectively for each of the textures T₀ and T₁ the modefor obtaining the depth maps D₀ and D₁.

If it is decided to encode and transmit the depth map D₀, respectivelyD₁, the value of the syntax element d₀, respectively d₁ is for example0, then the data stream BS comprises the encoded depth map D₀,respectively D₁.

If it is decided not to encode the depth map D₀, respectively D₁, thevalue of the syntax element d₀, respectively d₁ is for example 1, thenthe data stream BS does not comprise the depth map D₀ respectively D₁.It can possibly comprise, according to the embodiment variants, one ormore control parameters PAR to be applied when obtaining the depth mapD⁺ ₀, respectively D⁺ ₁, by the decoder or by the synthesis module.

The encoded data stream BS is then decoded by the decoder DEC. Forexample, the decoder DEC is included in a smartphone equipped with freenavigation decoding features. According to this example, a user looks atthe scene from the viewpoint provided by the first camera. Then, theuser slowly slides their viewpoint to the left to the other camera.During this process, the smartphone displays intermediate views of thescene that have not been captured by the cameras.

For this purpose, the data stream BS is scanned and decoded by anMV-HEVC decoder for example, to provide two decoded textures T*₀ andT*₁. The syntax element d_(k), with k=0 or 1, associated with eachtexture is decoded. If the value of the syntax element d_(k) is 0, thenthe decoder decodes the depth map D*_(k) from the data stream BS.

If the value of the syntax element d_(k) is 1, the depth map D⁺ _(k) isestimated at the decoder or by a synthesis module from the decodedtextures T*₀ and T*₁.

A synthesis module SYNTH, for example based on a VVS (Versatile ViewSynthesizer) synthesis algorithm, synthesises intermediate views withthe decoded textures T*₀ and T*₁ and the decoded D*₀ and D*₁ orestimated D⁺ ₀ and D⁺ ₁ depth maps as appropriate to synthesiseintermediate views comprised between the views corresponding to thetextures T₀ and T₁.

The multi-view video data processing scheme described in FIG. 5 is notlimited to the embodiment described above.

According to another particular embodiment of the invention, the sceneis captured by six omnidirectional cameras located in the scene, fromsix different locations. Each camera provides a sequence of 2D images inan equi-rectangular projection format (ERP). The six textures from thecameras are encoded using a 3D-HEVC encoder that is a multi-viewencoder, providing a data stream BS that is for example transmitted viaa data network. When encoding the multi-view sequence, a 2×3 matrix ofsource textures T (textures from the cameras) is provided as input tothe encoder. FIG. 6A illustrates such a texture matrix comprising thetextures T_(xiyj) with i=0, 1 or 2 and j=0, 1 or 2.

According to the embodiment described here, a source depth map matrix Dis estimated from the uncompressed textures using a depth estimatorbased on a neural network. The texture matrix T is encoded and decodedusing the 3D-HEVC encoder providing the decoded texture matrix T*. Thedecoded texture matrix T* is used to estimate the depth map matrix D⁺using the depth estimator based on the neural network.

According to the particular embodiment of the invention described here,an obtaining mode for the depth map associated with a texture isselected for each encoding block or unit (also known as CTU for CodingTree Unit in the HEVC encoder).

FIG. 6B illustrates the steps of the depth encoding method for a currentblock D_(x0y0)(x,y,t) to be coded, where x,y corresponds to the positionof the upper left corner of the block in the image and t to the timeinstant of the image.

The first block of the first depth map to be coded at time t=0 of thevideo sequence, identified by D_(x0y0)(0,0,0) thereafter, is considered.When encoding this first block D_(x0y0)(0,0,0), the depth for all otherblocks that have not yet been processed is assumed to originate from theestimated source depth D. The other blocks that have not yet beenprocessed belong to both the current view x0y0 and the otherneighbouring views.

The depth encoding for the current block D_(x0y0)(0,0,0) is firstevaluated by determining an optimal encoding mode from among variousblock depth encoding tools available to the encoder. Such encoding toolscan include any type of depth encoding tools available in a multi-viewencoder.

In a step 60, the depth D_(x0y0)(0,0,0) of the current block is encodedusing a first encoding tool providing an encoded-decoded depthD*_(x0y0)(0,0,0) for the current block.

In a step 61, a view at a position of one of the cameras is synthesisedusing the VVS synthesis software for example. For example, a view atposition x1y0 is synthesised using the views decoded at positions x0y0,x2y0 and x1y1 of the texture matrix T*. During the synthesis of theview, the depth for all blocks of the multi-view video that have not yetbeen processed originates from the estimated source depth D. The depthfor all blocks of the multi-view video for which the depth has alreadybeen encoded originates from the depth that was encoded-decoded orestimated from the decoded textures according to the obtaining modeselected for each block. The depth of the current block used for thesynthesis of the view at position x1y0 is the encoded-decoded depthD*_(x0y0)(0,0,0) according to the encoding tool being evaluated. In step62, the quality of the synthesised view is evaluated using an errormetric, such as a squared error, between the synthesised view atposition x1y0 and the source view T_(x1y0), and the encoding cost of thecurrent block depth according to the tested tool is calculated.

In a step 63, it is checked whether all depth encoding tools have beentested for the current block, and if not, steps 60 to 62 are iteratedfor the next encoding tool, otherwise the method proceeds to step 64.

In step 64, the depth encoding tool providing the best rate/distortioncompromise is selected, for example the one that minimises therate/distortion criterion J=D+λR.

In a step 65, another view at the same position as in step 61 issynthesised using the decoded textures at positions x0y0, x2y0 and x1y1with the VVS software and the estimated depth of the current block D⁺_(x0y0)(0,0,0).

In a step 66, the distortion between the synthesised view at positionx1y0 and the source view T_(x1y0) is calculated and the depth encodingcost is set to 0, since according to this obtaining mode, the depth isnot encoded but estimated at the decoder.

In a step 67, it is decided on the optimal obtaining mode according tothe rate/distortion cost of each mode for obtaining the depth. In otherwords, the mode for obtaining the depth that minimises therate/distortion criterion is selected from among encoding the depth withthe optimal encoding tool selected in step 64 and estimating the depthat the decoder.

In a step 68, a syntax element is encoded in the data stream specifyingthe selected obtaining mode for the current block. If the selectedobtaining mode corresponds to depth encoding, the depth is encoded inthe data stream according to the optimal encoding tool selectedpreviously. Steps 60 to 68 are iterated considering the next block to beprocessed D_(x0y0)(64,0,0) for example if the first block has a size64×64. All blocks in the depth map associated with the view texture atposition x0y0 are processed correspondingly taking into account theencoded-decoded or estimated depths of the blocks previously processedduring the synthesis of the view.

The depth maps of the other views are also processed in a similar way.

According to this particular embodiment of the invention, the encodeddata stream further comprises different information for each block. Ifit has been decided to encode and transmit the depth for a given block,the data stream includes for this block the encoded texture of theblock, a block of encoded depth data and the syntax element specifyingthe mode for obtaining the depth for the block.

If it was decided not to encode the depth for the block, the data streamincludes for the block the encoded texture of the block, a block ofdepth information including the same grey level value, and the syntaxelement specifying the mode for obtaining the depth for the block.

It should be noted that in some cases, the data stream can comprise theconsecutively encoded textures for all blocks, then the depth data andthe syntax elements of the blocks.

The decoding can for example be carried out via a virtual realityheadset equipped with free navigation features and worn by a user. Theviewer looks at the scene from the viewpoint provided by one of the sixcameras. The user looks around and slowly starts to move around thescene. The headset follows the user's movement and displayscorresponding views of the scene that have not been captured by thecameras.

For this purpose, the decoder DEC decodes the texture matrix T* from theencoded data stream. The syntax elements for each block are also decodedfrom the encoded data stream. The depth of each block is obtained bydecoding the block of depth data encoded for the block or by estimatingthe depth data from the decoded textures according to the value of thesyntax element decoded for the block.

An intermediate view is synthesised using the decoded texture matrix T*and the reconstructed depth matrix comprising for each block the depthdata obtained according to the obtaining mode specified by the syntaxelement decoded for the block.

According to another particular embodiment of the invention, themulti-view video data processing scheme described in FIG. 5 also appliesin the case where the syntax element is not encoded at the block levelor at the image level.

For example, the encoder COD can apply a decision mechanism at the imagelevel to decide whether the depth should be transmitted to the decoderor estimated after decoding.

For this purpose, the encoder, that operates in variable rate mode,allocates quantization parameters (QPs) to the blocks of texture imagesin a known manner so as to achieve a target overall rate.

An average of the QPs allocated to each block of a texture image iscalculated, possibly using a weighting between blocks. This provides anaverage PQ for the texture image, representative of a level ofimportance of the image.

If the average PQ obtained is above a certain threshold, it means thatthe target rate is a low rate. The encoder then decides to calculate thedepth map for this texture image from the uncompressed textures of themulti-view video, encode the calculated depth map, and transmit it inthe data stream.

If the average PQ is below or equal to the determined threshold, thetarget rate is a high rate. The encoder does not calculate the depth forthis texture image and proceeds to the next texture image. No depth isencoded for this image, nor is any indicator transmitted to the decoder.

FIG. 7B illustrates an example of one part of an encoded data streamaccording to this particular embodiment of the invention.

The encoded data stream notably comprises the encoded textures for eachimage, here T₀ and T₁. The encoded data stream also comprisesinformation to obtain the average PQ of each image. For example, thiscan be encoded at the image level, or obtained conventionally from theQPs encoded for each block in the data stream.

For each texture image T₀ and T₁, the encoded data stream also comprisesthe calculated and encoded depth data D₀ and/or D₁ according to thedecision made at the encoder. Is it noticed here that the syntaxelements d₀ and d₁ are not encoded in the data stream. When it has beendecided to estimate the depth for a texture image at the decoder, thedata stream can comprise parameters PAR to be applied when estimatingthe depth. These parameters have already been described above.

The decoder DEC scans the encoded data stream and decodes the textureimages T*₀ and T*₁. The decoder applies the same decision mechanism asthe encoder, calculating the average PQ of each texture image. Thedecoder then deduces, using the determined threshold, that can betransmitted in the data stream or known to the decoder, whether thedepth for a given texture image should be decoded or estimated.

The decoder then operates in a manner similar to that described inrelation to the first embodiment of FIG. 5 .

FIG. 8 shows the simplified structure of an encoding device COD adaptedto implement the encoding method according to any one of the particularembodiments of the invention previously described, in particular inrelation to FIGS. 2, 4A and 4B. The encoder COD can for examplecorrespond to the encoder COD described in relation to FIG. 5 .

According to a particular embodiment of the invention, the steps of theencoding method are implemented by computer program instructions. Forthis purpose, the encoding device COD has the standard architecture of acomputer and notably comprises a memory MEM, a processing unit UT,equipped for example with a processor PROC, and driven by the computerprogram PG stored in the memory MEM. The computer program PG comprisesinstructions for implementing the steps of the encoding method asdescribed above, when the program is executed by the processor PROC.

At initialisation, the code instructions of the computer program PG arefor example loaded into a memory before being executed by the processorPROC. In particular, the processor PROC of the processing unit UTimplements the steps of the encoding method described above, accordingto the instructions of the computer program PG.

FIG. 9 shows the simplified structure of a device for processingmulti-view video data DTV adapted to implement the method for processingmulti-view data according to any one of the particular embodiments ofthe invention previously described, in particular in relation to FIGS.2, 3A and 3B. The device for processing multi-view video data DTV canfor example correspond to the synthesis module SYNTH described inrelation to FIG. 5 or to a device comprising the synthesis module SYNTHand the decoder DEC of FIG. 5 .

According to a particular embodiment of the invention, the device forprocessing multi-view video data DTV has the standard architecture of acomputer and notably comprises a memory MEMO, a processing unit UTO,equipped for example with a processor PROCO, and driven by the computerprogram PGO stored in the memory MEMO. The computer program PGOcomprises instructions for implementing the steps of the method forprocessing multi-view video data as described above, when the program isexecuted by the processor PROC0.

At initialisation, the code instructions of the computer program PG0 arefor example loaded into a memory before being executed by the processorPROC0. In particular, the processor PROC0 of the processing unit UT0implements the steps of the method for processing multi-view video datadescribed above, according to the instructions of the computer programPG0.

According to a particular embodiment of the invention, the device forprocessing multi-view video data DTV comprises a decoder DEC adapted todecode one or more encoded data streams representative of a multi-viewvideo.

1. A method for processing multi-view video data, the method comprising:obtaining, for at least one block of an image of a view encoded in anencoded data stream representative of the multi-view video, at least oneinformation item specifying a mode for obtaining at least one synthesisdata item, from among a first obtaining mode and a second obtainingmode, said at least one synthesis data item being used to synthesize atleast one image of an intermediate view of the multi-view video, saidintermediate view not being encoded in said encoded data stream, saidfirst obtaining mode corresponding to decoding the at least oneinformation item representative of the at least one synthesis data itemfrom the encoded data stream, said second obtaining mode correspondingto obtaining the at least one synthesis data item from at least saidreconstructed encoded image, obtaining the at least one synthesis dataitem according to the obtaining mode specified by said at least oneinformation item, and synthesizing at least one part of an image of saidintermediate view from at least said reconstructed encoded image andsaid at least one synthesis data item obtained.
 2. The method forprocessing multi-view video data according to claim 1, wherein said atleast one synthesis data item corresponds to at least one part of adepth map.
 3. The method for processing multi-view video data accordingto claim 1, wherein said at least one information item specifying themode for obtaining the synthesis data item is obtained by decoding asyntax element.
 4. The method for processing multi-view video dataaccording to claim 1, wherein said at least one information itemspecifying the mode for obtaining the synthesis data item is obtainedfrom at least one data item encoded for the reconstructed encoded image.5. The method for processing multi-view video data according to claim 4,wherein the obtaining mode is selected from among the first obtain modeand the second obtain mode based on a value of a quantization parameterused to encode at least said block.
 6. The method for processingmulti-view video data according to claim 1, further comprising, inresponse to said at least one information item specifying that thesynthesis data item is obtained according to the second obtaining mode:decoding at least one control parameter from the encoded data stream,applying said control parameter when obtaining said synthesis data itemaccording to the second obtaining mode.
 7. A device for processingmulti-view video data, comprising a processor configured to: obtain, forat least one block of an image of a view encoded in an encoded datastream representative of the multi-view video, at least one informationitem specifying a mode for obtaining at least one synthesis data item,from among a first obtaining mode and a second obtaining mode, said atleast one synthesis data item being used to synthesize at least oneimage of an intermediate view of the multi-view video, said intermediateview not being encoded in said encoded data stream, said first obtainingmode corresponding to decoding the at least one information itemrepresentative of the at least one synthesis data item from the encodeddata stream, said second obtaining mode corresponding to obtaining theat least one synthesis data item from at least said reconstructedencoded image, obtain the at least one synthesis data item according tothe obtaining mode specified by said at least one information item, andsynthesize at least one part of an image of said intermediate view fromat least said reconstructed encoded image and said at least onesynthesis data item obtained.
 8. A terminal comprising a deviceaccording to claim
 7. 9. A method for encoding multi-view video data,the method comprising: determining, for at least one block of an imageof a view in an encoded data stream representative of the multi-viewvideo, at least one information item specifying a mode for obtaining atleast one synthesis data item, from among a first obtaining mode and asecond obtaining mode, said at least one synthesis data item being usedto synthesize at least one image of an intermediate view of themulti-view video, said intermediate view not being encoded in saidencoded data stream, said first obtaining mode corresponding to decodingthe at least one information item representative of the at least onesynthesis data item from the encoded data stream, said second obtainingmode corresponding to obtaining the at least one synthesis data itemfrom at least said reconstructed encoded image, and encoding said imagein the encoded data stream.
 10. The method for encoding multi-view videodata according to claim 9, comprising encoding in the data stream asyntax element associated with said information item specifying a modefor obtaining the synthesis data item.
 11. The method for encodingmulti-view video data according to claim 9, further comprising, inresponse to the information item specifying that the synthesis data itemis obtained according to the second obtaining mode: encoding in theencoded data stream at least one control parameter to be applied whenobtaining said synthesis data item according to the second obtainingmode.
 12. A device for encoding multi-view video data, comprising aprocessor and a memory configured to: determine, for at least one blockof an image of a view in an encoded data stream representative of themulti-view video, at least one information item specifying a mode forobtaining at least one synthesis data item, from among a first obtainingmode and a second obtaining mode, said at least one synthesis data itembeing used to synthesize at least one image of an intermediate view ofthe multi-view video, said intermediate view not being encoded in saidencoded data stream, said first obtaining mode corresponding to decodingthe at least one information item representative of the at least onesynthesis data item from the encoded data stream, said second obtainingmode corresponding to obtaining the at least one synthesis data itemfrom at least said reconstructed encoded image, and encode said image inthe encoded data stream.
 13. (canceled)
 14. A non-transitorycomputer-readable storage medium, comprising instructions of a computerprogram stored thereon which when executed by a processor of a device,configure the device to implement a method of processing multi-viewvideo data, the method comprising: obtaining, for at least one block ofan image of a view encoded in an encoded data stream representative ofthe multi-view video, at least one information item specifying a modefor obtaining at least one synthesis data item, from among a firstobtaining mode and a second obtaining mode, said at least one synthesisdata item being used to synthesize at least one image of an intermediateview of the multi-view video, said intermediate view not being encodedin said encoded data stream, said first obtaining mode corresponding todecoding the at least one information item representative of the atleast one synthesis data item from the encoded data stream, said secondobtaining mode corresponding to obtaining the at least one synthesisdata item from at least said reconstructed encoded image, obtaining theat least one synthesis data item according to the obtaining modespecified by said at least one information item, and synthesizing atleast one part of an image of said intermediate view from at least saidreconstructed encoded image and said at least one synthesis data itemobtained.
 15. A non-transitory computer-readable storage medium,comprising instructions of a computer program stored thereon which whenexecuted by a processor of a device, configure the device to implement amethod for encoding multi-view video data, the method comprising:determining, for at least one block of an image of a view in an encodeddata stream representative of the multi-view video, at least oneinformation item specifying a mode for obtaining at least one synthesisdata item, from among a first obtaining mode and a second obtainingmode, said at least one synthesis data item being used to synthesize atleast one image of an intermediate view of the multi-view video, saidintermediate view not being encoded in said encoded data stream, saidfirst obtaining mode corresponding to decoding the at least oneinformation item representative of the at least one synthesis data itemfrom the encoded data stream, said second obtaining mode correspondingto obtaining the at least one synthesis data item from at least saidreconstructed encoded image, and encoding said image in the encoded datastream.